深度神经网络(DNN)模型越来越多地使用新的复制测试数据集进行评估,这些数据集经过精心创建,类似于较旧的和流行的基准数据集。但是,与期望相反,DNN分类模型在这些复制测试数据集上的准确性上表现出显着,一致且在很大程度上无法解释的降解。虽然流行的评估方法是通过利用各自测试数据集中可用的所有数据点来评估模型的准确性,但我们认为这样做会阻碍我们充分捕获DNN模型的行为以及对其准确性的现实期望。因此,我们提出了一种原则性评估协议,该协议适用于在多个测试数据集上对DNN模型的准确性进行比较研究,利用可以使用不同标准(包括与不确定性相关信息)选择的数据点子集进行的子集。通过使用此新评估协议,我们确定了(1)CIFAR-10和Imagenet数据集上$ 564 $ DNN型号的准确性,以及(2)其复制数据集。我们的实验结果表明,已观察到的基准数据集及其复制之间观察到的准确性降解始终较低(即模型在复制测试数据集上的性能更好),而不是在已发表的作品中报告的准确性退化,并依靠这些已发表的作品依赖于常规评估。不利用不确定性相关信息的方法。
translated by 谷歌翻译
虽然ImageNet最初被提出为计算机愿景领域的性能基准的数据集,但它也支持各种其他研究工作。对抗机器学习是一种这样的研究努力,采用欺骗性输入来制作错误的预测。为了评估对抗机器学习领域的攻击和防御,Imagenet仍然是最常用的数据集之一。但是,尚待调查的主题是对抗性实例被错误分类的课程的性质。在本文中,我们对这些错误分类类进行了详细的分析,利用了想象群类层次结构并测量了对逆势示例的不受干扰的起源中上述类别的相对位置。我们发现71%的普遍例子,即实现模型 - 模型对抗性转移性的普遍例子被错误分类为对底层源图像预测的前5个类之一。我们还发现,实际上,大量未确定的错误分类子集实际上是分类到语义上类似的课程。根据这些调查结果,我们讨论在评估未确定的对抗性成功时需要考虑到Imageenet类层次结构。此外,我们倡导未来的研究努力,以合并分类信息。
translated by 谷歌翻译
虽然近年来,深度神经网络(DNN)的采用率大幅增加,但尚未发现对对抗对抗例子的脆弱性的解决方案。因此,大量的研究工作致力于解决这种弱点,许多研究通常使用源图像的子集来生成对抗示例,将该子集中的每个图像视为相等。我们证明,实际上,不是每个来源图像都同样适用于这种评估。为此,我们设计了一个大规模的模型到模型转移性方案,我们通过利用三种最常用的攻击来精心分析来自想象成中的每个合适的源图像中的每个合适的源图像。在这种可转移性方案中,这涉及七种不同的DNN模型,包括最近提出的视觉变压器,我们揭示了在模型到模型转移性成功中获得高达12.5美元的差异,平均为1.01美元L_2 $扰动,平均每平均$ 0.03 $($ 8/225 $),当所有合适的候选人中随机采样1000美元的源图像时,每次$ 0.03 $($ 8/225 $)。然后,我们采取一个第一个步骤评估用于创造逆势示例的图像的稳健性,提出了许多简单但有效的方法来识别不合适的源图像,从而使得可以减轻实验中的极端情况并支持高质量的基准测试。
translated by 谷歌翻译
每年,AEDESAEGYPTI蚊子都感染了数百万人,如登录,ZIKA,Chikungunya和城市黄热病等疾病。战斗这些疾病的主要形式是通过寻找和消除潜在的蚊虫养殖场来避免蚊子繁殖。在这项工作中,我们介绍了一个全面的空中视频数据集,获得了无人驾驶飞行器,含有可能的蚊帐。使用识别所有感兴趣对象的边界框手动注释视频数据集的所有帧。该数据集被用于开发基于深度卷积网络的这些对象的自动检测系统。我们提出了通过在可以注册检测到的对象的时空检测管道的对象检测流水线中的融合来利用视频中包含的时间信息,这些时间是可以注册检测到的对象的,最大限度地减少最伪正和假阴性的出现。此外,我们通过实验表明使用视频比仅使用框架对马赛克组成马赛克更有利。使用Reset-50-FPN作为骨干,我们可以分别实现0.65和0.77的F $ _1 $ -70分别对“轮胎”和“水箱”的对象级别检测,说明了正确定位潜在蚊子的系统能力育种对象。
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译
The advances in Artificial Intelligence are creating new opportunities to improve lives of people around the world, from business to healthcare, from lifestyle to education. For example, some systems profile the users using their demographic and behavioral characteristics to make certain domain-specific predictions. Often, such predictions impact the life of the user directly or indirectly (e.g., loan disbursement, determining insurance coverage, shortlisting applications, etc.). As a result, the concerns over such AI-enabled systems are also increasing. To address these concerns, such systems are mandated to be responsible i.e., transparent, fair, and explainable to developers and end-users. In this paper, we present ComplAI, a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior in drift scenarios, and to provide a single Trust Factor that evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective. The framework helps users to (a) connect their models and enable explanations, (b) assess and visualize different aspects of the model, such as robustness, drift susceptibility, and fairness, and (c) compare different models (from different model families or obtained through different hyperparameter settings) from an overall perspective thereby facilitating actionable recourse for improvement of the models. It is model agnostic and works with different supervised machine learning scenarios (i.e., Binary Classification, Multi-class Classification, and Regression) and frameworks. It can be seamlessly integrated with any ML life-cycle framework. Thus, this already deployed framework aims to unify critical aspects of Responsible AI systems for regulating the development process of such real systems.
translated by 谷歌翻译
Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by calibration head in training procedure is developed to improve its performance. Under both the in-distribution and distributional shift circumstances, we exhaustively evaluate our Annealing Double-Head architecture on multiple pairs of contemporary DNN architectures and vision and speech datasets. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing while simultaneously providing comparable predictive accuracy in comparison to other recently proposed calibration methods on a range of learning tasks.
translated by 谷歌翻译
Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model's sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.\footnote{https://github.com/amorimlb/scaling\_matters}
translated by 谷歌翻译
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
translated by 谷歌翻译